-
Notifications
You must be signed in to change notification settings - Fork 717
Cortex_m backend: Simplify add + linear fusion passes #15526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reuses the FoldAndAnnotateQParamsPass from the Arm backend to greatly simplify the logic for fusing the ops. Additionally updates the linear kernel to be numerically correct and computes the kernel_sum aot in the quantized_linear_fusion pass. Note that since this replaces the bias node it typically causes no extra memory usage. Updates the Linear tests to mirror this, including removing the various matmul tests. Since the linear is handled as a separate op rather than a particular type of matmul these tests are not related anymore. Removes unnecessary stub definitions in operators.py, operators.yaml and op_quantized_linear.cpp Leaving a few TODO:s since the patch is large already. Signed-off-by: Adrian Lundell <[email protected]> Change-Id: I194228ee3ae4b64a92f3f818afb2e045cc3acf91
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15526
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Cancelled Job, 4 Unrelated FailuresAs of commit dd7c05e with merge base d07a49a ( NEW FAILURE - The following job has failed:
CANCELLED JOB - The following job was cancelled. Please retry:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@AdrianLundell Just want to better understand the rationale behind removing per channel support / scratch buffer code and in general stateful ops related code ? (I do understand the FoldDQQ related changes to the pass though), (edited) |
|
The implementation didn't produce numerically results originally so I wanted to fix that before adding per-channel support. Not doing that in this patch was mostly to keep the patch size down and prioritizing the most important functionality before going into details. Regarding the stateful scratch buffer, since we compute the kernel sum which the buffer was used for AOT and replace the bias, there is no reason to do that in the runtime as I see it. The only use-case I see where that would make sense (using the MVE implementation) is if you are very tight on memory and prefer to spend some extra time during each interference to compute the kernel sum in a scratch buffer rather than keeping it in memory. But then again there is no reason to do everything in one single patch, better to get a good minimal working example running IMO. |
|
|
Also can you add comments in the code explaining why scratch buffer context is simplified and when it might still be needed ? Overall I think this is a solid refactor in the right direction |
We will need scratch buffer for some ops yes, but I am not sure if the header solution for keeping track of the state will ever be needed. As I see it, we can either do the computation AOT as in this case, or it will be reset with every inference as for CONV. Maybe I am missing some use-case here?
Agree we can keep it for potential future use.
The python op implementation is tested in the test_dialect_linear in backends/cortex_m/test/ops/test_linear.py and the cpp implementation is tested on FVP in the test_implementation_linear tests. The tests for running this after having downloaded executorch are:
These are available in test_linear.py. These tests were not passing before the patch. |
Signed-off-by: Adrian Lundell <[email protected]>
I think the logic is well explained in the commit message + the quantized_linear_fusion_pass.py |
|
Thanks, looks good over all, approving as we also need to move faster. [EDIT:] Looks like there are compilation issues / CI failures (not sure if it is related to #15612) |
Fix a merge issue causing the build to fail + update tests after merging of pytorch#15590 Signed-off-by: Adrian Lundell <[email protected]>
|
Fails in XNNPACK/ios-arm64-coreml-fp16 not related |
Reuses the FoldAndAnnotateQParamsPass from the Arm backend to greatly simplify the logic for fusing the ops. Additionally updates the linear kernel to be numerically correct and computes the kernel_sum aot in the quantized_linear_fusion pass. Note that since this replaces the bias node it typically causes no extra memory usage. Updates the Linear tests to mirror this, including removing the various matmul tests. Since the linear is handled as a separate op rather than a particular type of matmul these tests are not related anymore. Removes unnecessary stub definitions in operators.py, operators.yaml and op_quantized_linear.cpp Leaving a few TODO:s since the patch is large already. Signed-off-by: Adrian Lundell <[email protected]>
Reuses the FoldAndAnnotateQParamsPass from the Arm backend to greatly simplify the logic for fusing the ops.
Additionally updates the linear kernel to be numerically correct and computes the kernel_sum aot in the quantized_linear_fusion pass. Note that since this replaces the bias node it typically causes no extra memory usage.
Updates the Linear tests to mirror this, including removing the various matmul tests. Since the linear is handled as a separate op rather than a particular type of matmul these tests are not related anymore.
Removes unnecessary stub definitions in operators.py, operators.yaml and op_quantized_linear.cpp
Leaving a few TODO:s since the patch is large already.
cc @freddan80 @per @zingo @oscarandersson8218 @digantdesai